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Abstract 

The Persistent-Phylogeny Model is an extension of the widely studied Perfect- 
Phylogeny Model, encompassing a broader range of evolutionary phenomena. Bio¬ 
logical and algorithmic questions concerning persistent phylogeny have been intensely 
investigated in recent years. In this paper, we explore two alternative approaches 
to the persistent-phylogeny problem that grow out of our previous work on perfect 
phylogeny, and on galled trees. We develop an integer programming solution to the 
Persistent-Phylogeny Problem; empirically explore its efficiency; and empirically ex¬ 
plore the utility of using fast algorithms that recognize galled trees, to recognize per¬ 
sistent phylogeny. The empirical results identify parameter ranges where persistent 
phylogeny are galled trees with high frequency, and show that the integer program¬ 
ming approach can efficiently identify persistent phylogeny of much larger size than 
has been previously reported. 


1 The Perfect Phylogeny Problem for Binary Charac¬ 
ters 

The Persistent-Phylogeny Model is an extension of the Perfect-Phylogeny Model, so we 
begin with a brief discussion of perfect phylogeny. 

Definition Let M be an n by m matrix representing n taxa in terms of m characters or 
traits that describe the taxa. Each character takes on one of two possible states, 0 or 1; a 
cell (/, c) of M has a value of one if and only if the state of character c is 1 for taxon /. 
Thus the characters of M are binary-characters and M is called a binary matrix. 
Definition Given an n by to binary-character matrix M for n taxa, a perfect phylogeny for 
M with all-zero root sequence is a rooted (directed) tree T with exactly n leaves, obeying 
the following properties: 

1. Each of the n taxa labels exactly one leaf of T. 

2. Each of the to characters labels exactly one edge of T. 

3. For any taxon /, the characters that label the edges along the unique path from the 
root to the leaf labeled /, specify all of the characters that taxon / possesses (i.e., 
whose state is 1). 

The key biological assumption that leads to the perfect phylogeny model is that in the 
evolutionary history of the taxa, each character mutates from the zero state to the one state 
exactly once, and never from the one state back to the zero state. Hence every character 
c labels exactly one edge e in a perfect phylogeny T for M, indicating the unique point 
in the evolutionary history of the taxa when character c mutates. A character that has 
this property is called a perfect character 0 m [13 Hi ng. Of course, most evolutionary 
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characters are not perfect, but perfect (or near-perfect) characters are sufficiently frequent 
to motivate the study of perfect phylogeny. See m for a detailed discussion of the inter¬ 
pretation of perfect phylogenies, and of biological settings where perfect phytogenies are 
observed or hypothesized to exist. Additional recent examples of perfect characters and 
perfect phylogenies (not discussed in |llj l come from single-cell studies of mutating tumors 
(as one of several recent examples, see 0 ). 

The perfect-phylogeny problem: Given an n by m binary matrix M, determine 
whether there is a perfect phylogeny for M, and if so, build one. 

The perfect-phylogeny problem can be solved in polynomial (even linear) time. The 
following theorem is well-known and explained in many places, for example see mm- 

theorem 1.1 The perfect-phylogeny theorem Matrix M has a perfect phylogeny (with 
all-zero ancestral sequence) if and only if no pair of its columns c, d contains the three binary 
pairs 0,1; 1,0; and 1,1. 

Any pair of columns that contain all three binary pairs are called conflicted columns, 
and a column that is not conflicted with any other column is called unconflicted. 

1.1 Dollo Parsimony and Persistent Phylogeny 

Several extensions of the perfect character model have been proposed in order to address 
a wider range of evolutionary phenomena. In the Dollo (Parsimony) model [6l [T) 121) , each 
character can mutate from the 0 state to the 1 state at most once in the history (as in the 
perfect phylogeny model), but the character can mutate from the 1 state back to the 0 state 
at any point where the character has state 1. This models evolutionary characters that are 
gained with low probability, but that are lost with much higher probability (allowing the 
1 to 0 mutation without constraint). The Dollo model is appropriate “for reconstructing 
evolution of the gene repertoire of eukaryotic organisms because although multiple, inde¬ 
pendent losses of a gene in different lineages are common, multiple gains of the same gene 
are improbable [2T|.” 

More recently, a more limited version of the Dollo model was proposed, where any char¬ 
acter can mutate from state 0 to state 1 at most once in the history, and symmetrically, it can 
mutate from state 1 to state 0 at most once. This model is called the Persistent-Phylogeny 
model, or the Persistent-Perfect-Phylogeny model. It models evolutionary characters that 
are gained with low probability, and then are lost with low (but not-zero) probability. 
The Persistent-Phylogeny model was first proposed and further examined in the papers 

mm mm mis]. 

The Persistent-Phylogeny Problem: Given an n by m binary matrix M, determine 
whether M can be represented by a persistent phylogeny for M , and if so, build one. 


1.2 Persistent Phylogeny and Galled Trees 

The complexity of the persistent-phylogeny problem is open - there is no known polynomial¬ 
time algorithm for the problem, and neither has it been shown to be NP-complete. Of course, 
if M has a perfect phylogeny, then it has a persistent phylogeny, so a polynomial-time special 
case of the persistent phylogeny problem is the case of data that has a perfect phylogeny. 
A more interesting polynomial-time special case is that of a galled tree, which might be a 
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directed tree, but might be a particular type of directed acyclic graph (DAG). In detail, 
a galled tree is a rooted DAG G, where all cycles in the underlying undirected graph of 
G are node disjoint. Trivially, every perfect phylogeny is a galled tree, but many datasets 
that cannot be represented on a perfect phylogeny can be represented on a galled tree. A 
full, formal definition of a galled tree can be found in [HE]. The relationship between 
persistent phylogeny and galled trees was developed in |12j and also discussed in m- That 
relationship is summarized as follows: 

theorem 1.2 If binary data M can be represented by a galled tree, then M can be repre¬ 
sented by a persistent phylogeny. Moreover, the galled tree for M can be converted to a 
persistent phylogeny for M in linear time. 

The question of whether a binary data M can be represented by a galled tree has a 
polynomial-time solution mm. and a practical implementation as program galledtree.pl, 
which is available through this authors website |9]. The program is very fast, and (as 
detailed later in the paper) solved every problem instance examined for this paper in under 
one second. Thus, any approach to the general persistent phylogeny problem needs only 
concentrate on data that is not representable by a galled tree. This naturally leads to 
the question of how frequently data that is representable on a persistent phylogeny is also 
representable by a galled tree. This paper, in part, addresses that question through empirical 
testing. The results are discussed in section |4| 

2 Solving the Persistent-Phylogeny Problem by Integer 
Programming 

When data M is not representable by a galled tree, it might still be representable by a 
persistent phylogeny, and so we would like a practical method to solve instances of the 
persistent-phylogeny problem on problem sizes as large as possible. Efforts to develop such 
algorithms, and test their efficacy, appear in HI mg. Here we develop and study a practical 
method that can solve the persistent-phylogeny problem using integer linear programming, 
on instances of larger size than have previously been reported. Integer linear programming 
has been successful in efficiently solving many hard problems on instances whose size and 
structure is of relevance in current applied domains. 


3 An ILP Solution to Persistent-Phylogeny Problem 

The persistent-phylogeny problem was shown in [1] to be reducible to a problem called 
the Incomplete perfect-phylogeny with persistent completion (IP-PP) problem. The integer 
programming solution in this paper follows that approach, solving instances of the persistent 
phylogeny problem by solving instances of the IP-PP problem. Next, we define the IP-PP 
problem, and its integer programming formulation. 

Definition [1]: Given a binary matrix M, the extended matrix Mg contains two columns, 
ji and j 2 , for each column j in M. Column ji of Mg is derived from column j in M by 
replacing every occurrence of ‘0’ in column j of M with ‘?’ in column ji of Mg. Column 
of Mg is derived from column ji by replacing every occurrence of ‘I’ in ji with ‘O’. See 
Figure [TJ 
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M Mg M' = Completion of Mg 

1110 101010 ?? 10101000 

0111 ??101010 11101010 

0000 ???????? 00000000 

1010 10 ?? 10 ?? 10001000 

1100 1010 ???? 10101100 

1111 10101010 10101010 


Figure 1: M, Me and M' 


Note that for any pair of columns (ji, J 2 ) in Mg that are derived from the same column 
j in M, column ji contains a ’?’ in a row r if and only if column j 2 also contains a ’?’ in 
row r. We call such cells twin cells. 

Definition [1]: A completion M' of an extended matrix Mg derived from binary matrix M, 
is obtained by replacing each ’?’ in Mg with ’0’ or ’1’, subject to the constraint that for 
any column j in M, if Mg{r,ji) = ?, then M'(r,ji) = M'(r,j 2 )- That is, the values given 
to twin cells must be the same. See Figure [TJ 

The following theorem, stated and proved in [1], is central to the integer programming 
approach developed in this paper. 

theorem 3.1 Let Mg be the extended matrix obtained from binary matrix M. Then M can 
be represented by a persistent phytogeny if and only if there is a completion Mg of Mg such 
that Mg can be represented by a perfect phytogeny. 

Given M, the IP-PP problem is the problem of determining if there is a completion of 
Mg that can be represented by a perfect phylogeny. 

3.1 The Integer Linear Program for the IP-PP problem 

The integer linear programming approach to solving the IP-PP problem is an extension of 
the ILP formulation for the Incomplete directed perfect phylogeny (IDPP) problem which is 
defined next. 

Given an n by m binary matrix M, with a set of cells JC that have missing values, 
find {0,1} values to assign to the cells in /C so that the resulting matrix M' has 
a perfect phylogeny with all-zero ancestral sequence; or determine that there is 
no such assignment. When there is such an assignment, we call it a solution to 
the IDPP problem. 

The IDPP problem actually has a polynomial-time solution but we do not make 
use of it here. Rather, we use an ILP approach to a variant of IDPP problem from [1^. In 
that ILP approach, there is one binary variable Y{i,j) for each cell (*,j) in 1C. The core 
of the ILP formulation specifies linear inequalities that constrain the values given to the 
Y variables so that the resulting matrix M' satisfies Theorem 11.11 i.e., the necessary and 
sufficient conditions for the data to be representable by a perfect phylogeny. Gonsequently, 
for a dataset M, the IDPP problem has a solution if and only if the corresponding ILP 
instance is feasible. 
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We can easily modify the ILP formulation for the IDPP problem to obtain an ILP 
solution to the IP-PP problem. In particular, given matrix M, we build the extended 
matrix Mg from M, and consider each cell with a “?’ to be a cell with a missing value. 
Then, we construct the ILP formulation for the IDPP problem for input matrix Me, with 
the added equality “ Y(r,ji) = Y{r,j 2 )” for each pair of twin cells, (r, ji), (r,^ 2 ). 

These equalities assure that twin cells receive the same values. We call the resulting ILP 
formulation, modified from an IDPP formulation, the MIDPP formulation for input M. We 
implemented the MIDPP formulation by extending the previously developed software for 
the IDPP problem, available on the author’s webpage [5]. The software (called PERILP.pl, 
written in Perl) that creates an MIDPP formulation given M is also available there. 

In review 

The acronyms are similar and confusing, so here we summarize the integer linear program¬ 
ming solution to the persistent phytogeny problem. Binary data M can be represented by 
a persistent phytogeny, if and only if the extended matrix Me has a completion M' that 
can be represented by a perfect phytogeny; if and only if the MIDPP integer programming 
formulation for Me has a feasible solution. 

4 Empirical Evaluation 

We conducted extensive empirical testing, under two approaches to data generation, and 
differing combinations of parameters, to answer several questions: 

1) How frequently is data that is representable by a persistent phytogeny also 
representable by a galled tree? 

2) When data is representable by a galled tree, how quickly can a galled tree 
be found by specialized galled tree software, and how quickly can a persistent 
phytogeny by found for that data by the ILP approach for the general persistent 
phytogeny problem? 

3) When data is representable by a persistent phylogeny, but not a galled tree, 
how quickly can a persistent phylogeny for that data be found by the ILP ap¬ 
proach? 

4) When data is not representable by a persistent phylogeny, how quickly can 
the ILP method determine this? 

Our empirical tests varied n, the number of rows; m, the number of columns; and bp, 
the probability of a back-mutation occurring on an edge. As detailed in the tables below, 
n ranged from 40 to 1000, and m ranged from 30 to 500, and bp was selected from the 
set {0.01,0.05,0.2,0.4}. For each combination of parameters, 50 individual datasets were 
generated, except when n = 1000, where only 25 datasets were generated. 

4.1 Data Generation 

Data was generated in two ways, one that guaranteed data that can be represented by a 
persistent phylogeny, and one that only guaranteed that the data can be represented by a 
phylogeny under the Dollo model. However, in the second case, the generated data only 
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infrequently lacked a persistent pliylogeny. Thus, the empirical testing was mostly directed 
at data that could be represented by a persistent phylogeny. 

Here we hrst describe how data for a single problem instance is generated when the data 
is guaranteed to be represented by a persistent phylogeny. To start, the program MS m 
is used to generate data, denoted D, with a specified number of rows (n) and a specified 
number of columns (m) that satisfies the perfect phylogeny condition given in Theorem 
11.11 Let T denote the perfect phylogeny that MS generates for D. Then, conceptually, 
but with a faster implementation, the algorithm successively walks from the root of T to 
each leaf, to determine where any back mutations will occur. In detail, when an edge e is 
traversed during the walk of T, the program generates a random number r; and if r < bp, 
the program tries to find a character to back mutate at e. It does this by randomly choosing 
a character c' that was mutated to state one on an edge e' leading to e, such that character 
c' has not yet been back-mutated anywhere in T. If such a character d is found, then d is 
back-mutated on edge e. The modified tree T is thus a persistent phylogeny for a dataset 
M, i.e., the sequences at the leaves of T, generated on the walks in T. Note that despite 
being generated on a unconstrained persistent phylogeny with back mutations, M might 
still be representable by a galled tree, or a perfect phylogeny. 

The same approach is followed when generating data for the Dollo model, but if r < bp 
at edge e, we only require that character c has not been back-mutated on any edge leading 
to e. Again, although the data is generated under the unconstrained Dollo model, it is 
possible that the data generated is representable on a persistent phylogeny, a galled tree, or 
even a perfect phylogeny. 

Once M is generated, all unconflicted columns are removed from M, since this makes the 
problem instance smaller, and it is known that such removals do not affect the existence or 
non-existence of persistent phytogenies. In fact, this is a consequence of a more general fact 
about unconflicted columns m)- Data generation and the ILP solution can also be easily 
extended to the case when no ancestral sequence is known, but that discussion is omitted 
here. 

4.2 Empirical Results 

All programs used to generate the test data and create the ILP formulations, and analyze 
the results, were written in Perl. The ILP solver we used was Gurobi 6.0, running on 
a 2.3 GHz Macbook Pro with intel Gore i7 (four cores, and up to eight threads). The 
macbook ran OS X version 10.9.5. In our trials, whenever the feasible solution to the ILP 
was found (so by Theorem 13.11 there should be a persistent phylogeny for the dataset M), 
the values of the Y{i,j) variables were used to form a completion, M', of M^, and to build 
a perfect phylogeny for M', and a persistent phylogeny for M, verifying constructively that 
the programs ran correctly. All of the programs written in Perl are available on the author’s 
webpage, and Gurobi is free to academics and researchers. Thus, all the results presented 
here can be independently verihed (or contradicted - we hope not), by the readers. 

Results for data generated on a persistent phylogeny, are shown in tables[T]and[2] Galled 
tree computations were run only on the datasets in the first table, due to size limitation on 
the galled-tree program. Each line in a table shows the results for a particular combination of 
parameters. For each combination of parameters, fifty datasets were generated and analyzed. 
The first column in the table shows the number of rows (r) and columns (c) in the dataset. 
The second column ( br) shows the back-mutation rate used to generate the data (explained 
in the text). When galled-tree computations were run, the third column [Gtime) shows the 
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total reported time (two digits after the decimal point) to test the fifty datasets to see which 
of them can be represented by a galled tree. Thus, only a fraction of a second is needed per 
dataset to check if it can be represented on a galled tree. The fourth column {data types) 
shows the number of datasets that can be represented on a perfect phytogeny (per/); or on a 
galled tree {gt), but not a perfect phytogeny; or on a persistent phytogeny (pers), but not a 
galled tree. Because of the way these data were generated, those numbers add to the number 
of datasets tested for that combination of parameters (50 in this table). The fifth column 
{conflicts) shows the average number of pairwise conflicts observed in the datasets; these 
numbers are reported for each of the three types of datasets in the same order as in column 
four. The sixth column {ILP-data) gives information about the performance of the ILP 
solver on the data. The first number {inf) is the number of datasets that were determined 
to be infeasible by the ILP solver, meaning that the data could not be represented on a 
persistent phytogeny. By the way these data were generated, that number should be zero 
in these tables, and so that entry is only used as a consistency check. The second number 
in ILP-data is the number of ILP computations that were interrupted {int) because they 
exceeded the six-minute time limit. The third number {tm-gt) is the average time taken by 
the ILP solver on those datasets that (independently) were determined to be representable 
by a galled tree {gt)] none of these were interrupted. The fourth number {tm-pers) in ILP- 
data is the average time taken by the ILP solver on those datasets that can be represented 
by a persistent phytogeny {pens), but not a galled tree, and where the ILP execution was 
not interrupted. 

Datasets not guaranteed to have a persistent phylogeny As discussed earlier, we 
also generated datasets that might not be representable by a persistent phylogeny, but could 
be generated on a tree under the Dollo model. About %I0 of the generated data were not 
representable on a persistent phylogeny. The running times of these data were consistent 
with the datasets reported earlier, and we omit an explicit discussion here due to space 
limitations. 

4.3 The most striking results 

The most striking result is how efficiently the ILP approach solves the persistent phylogeny 
problem, with the examined data. In data generated from a persistent phylogeny, the 
majority of the datasets were solved in under one second, and most solved in a handful of 
seconds. The six-minute time limit was never reached until datasets of 400 taxa and 400 
sites. The results also verify that the running times are highly sensitive to the number 
of sites, and less sensitive to the number of taxa and the number of conflicts in the data. 
These qualitative observations were first reported in mm- The sizes of the datasets, the 
complexities of the dataset (measured in the number of conflicting pairs), and the times 
reported, compare very favorably to the best unconstrained results reported earlier in the 
literature (for example, see Tables numbered 1 in [I] and m)- 

The second most significant result is that many of the datasets that are representable 
by a persistent phylogeny are also representable by a galled tree. Since the time needed 
to determine if a dataset M is representable by a galled tree is typically much less than 
the time needed to determine if M is representable by a persistent phylogeny, the empirical 
results in this paper suggest that when trying to determine if a dataset is representable by 
a persistent phylogeny, one should first determine if M can be represented by a galled tree. 
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r; c 

br 

Gtime 

data types 

conflicts 

ILP-data 




perf; gt; pers 

gt; pers 

inf; int; tm-gt; tm-pers 

40, 30 

0.02 

0 

39, 10, 1 

2.29, 3 

0, 0, 0, 0 

40, 30 

0.05 

1 

29, 17, 4 

1.7, 3.75 

0, 0, 0, 0 

40, 30 

0.1 

0 

20, 23, 7 

1.73, 5 

0, 0, 0, 0.01 

40, 30 

0.2 

0 

7, 21, 22 

1.95, 6.13 

0, 0, 0, 0 

40, 100 

0.02 

3 

12, 17, 21 

7.35, 12.85 

0, 0, 0.01, 0.04 

40, 100 

0.05 

3 

3, 6, 41 

4.66, 24.56 

0, 0, 0, 0.16 

40, 100 

0.1 

3 

0, 0, 50 

0, 33.15 

0, 0, 0, 0.34 

40, 100 

0.2 

3 

0, 0, 50 

0, 48.7 

0, 0, 0, 0.62 

60, 30 

0.02 

0 

39, 10, 1 

1.8, 3 

0, 0, 0, 0 

60, 30 

0.05 

0 

26, 17, 7 

2.11, 4.28 

0, 0, 0, 0 

60, 30 

0.1 

0 

21, 22, 7 

1.63, 4.71 

0, 0, 0, 0 

60, 30 

0.2 

0 

12, 11, 27 

1.81, 5.03 

0, 0, 0, 0.01 

60, 100 

0.02 

4 

11, 23, 16 

4.56, 13.56 

0, 0, 0, 0.06 

60, 100 

0.05 

3 

1, 6, 43 

6, 18.83 

0, 0, 0.01, 0.13 

60, 100 

0.1 

4 

0, 2, 48 

5.5, 36.29 

0, 0, 0.03, 0.47 

60, 100 

0.2 

4 

0, 1, 49 

12, 56.04 

0, 0, 0.1, 0.95 

100, 30 

0.02 

0 

41,8, 1 

2.62, 5 

0, 0, 0, 0 

100, 30 

0.05 

1 

33, 16, 1 

1.56, 2 

0, 0, 0, 0 

100, 30 

0.1 

1 

18, 20, 12 

1.45, 4.75 

0, 0, 0, 0 

100, 30 

0.2 

1 

14, 21, 15 

2.14, 6 

0, 0, 0, 0.01 

100, 60 

0.02 

2 

15, 20, 15 

2.8, 6.46 

0, 0, 0, 0.01 

100, 60 

0.05 

2 

10, 16, 24 

2.31, 11.5 

0, 0, 0, 0.03 

100, 60 

0.1 

1 

1,8, 41 

2.37, 11.92 

0, 0, 0, 0.04 

100, 60 

0.2 

2 

0, 6, 44 

3.16, 15.75 

0, 0, 0, 0.11 

100, 100 

0.02 

4 

8, 16, 26 

4.12, 15.96 

0, 0, 0, 0.07 

100, 100 

0.05 

5 

2, 9, 39 

5.11, 19.17 

0, 0, 0.01, 0.13 

100, 100 

0.1 

5 

0, 0, 50 

0, 34.24 

0, 0, 0, 0.46 

100, 100 

0.2 

5 

0, 0, 50 

0, 44.08 

0, 0, 0, 1.22 

150, 80 

0.02 

3 

13, 37, 0 

1.48, 0 

0, 0, 0.09, 0 

150, 80 

0.05 

4 

0, 0, 50 

0, 13.84 

0, 0, 0, 0.41 

150, 80 

0.1 

3 

0, 0, 50 

0, 18.82 

0, 0, 0, 0.52 

150, 80 

0.2 

4 

1, 0, 49 

0, 31.69 

0, 0, 0, 1.54 

150, 100 

0.02 

5 

9, 0, 41 

0, 7.41 

0, 0, 0, 0.06 

150, 100 

0.05 

6 

2, 0, 48 

0, 21.12 

0, 0, 0, 0.25 

150, 100 

0.1 

6 

0, 0, 50 

0, 33.74 

0, 0, 0, 2.2 

150, 100 

0.2 

6 

5, 0, 45 

0, 56.04 

0, 0, 0, 3.67 


Table 1: Table for 40, 60, 100 and 150 taxa representable by a persistent phytogeny. 















r; c br Gtime data types conflicts ILP-data 

perf; gt; pers gt; pers inf; int; tm-gt; tm-pers 


200, 

100 

0.02 

- 

6, 

0, 

44 

0, 11.45 

0, 0, 

0, 

0.04 

200, 

100 

0.05 

- 

1, 

0, 

49 

0, 21.69 

o 

o 

0, 

0.22 

200, 

100 

0.1 

- 

0, 

0, 

50 

0, 34.86 

o 

o 

0, 

0.64 

200, 

100 

0.2 

- 

0, 

0, 

50 

0, 44.34 

o 

o 

0, 

0.93 

200, 

200 

0.02 

- 

0, 

0, 

50 

0, 54.9 

o 

o 

0, 

1.33 

200, 

200 

0.05 

- 

0, 

0, 

50 

0, 101.78 

o 

o 

0, 

3.67 

200, 

200 

0.1 

- 

0, 

0, 

50 

0, 138.56 

0, 0, 

0, 

9.42 

200, 

200 

0.2 

- 

0, 

0, 

50 

0, 173.78 

0, 0, 

0, 

13.67 

200, 

250 

0.02 

- 

0, 

0, 

50 

0, 86.86 

O 

O 

0, 

4.01 

200, 

250 

0.05 

- 

0, 

0, 

50 

0, 181.5 

0, 0, 

0, 

11.67 

200, 

250 

0.1 

- 

0, 

0, 

50 

0, 266.95 

0, 0, 

0, 

20.42 

200, 

250 

0.2 

- 

0, 

0, 

50 

0, 350.8 

0, 0, 

0, 

38.34 

400, 

200 

0.05 

- 

0, 

0, 

50 

0, 98.26 

O 

O 

0, 

4.37 

400, 

200 

0.1 

- 

0, 

0, 

50 

0, 172.72 

0, 0, 

0, 

10.15 

400, 

200 

0.2 

- 

0, 

0, 

50 

0, 186.36 

0, 0, 

0, 

16.49 

400, 

300 

0.05 

- 

0, 

0, 

50 

0, 277.72 

0, 0, 

0, 

21.77 

400, 

300 

0.1 

- 

0, 

0, 

50 

0, 400.8 

0, 0, 

0, 

47.98 

400, 

300 

0.2 

- 

0, 

0, 

50 

0, 595.98 

O 

O 

0, 

98.65 

400, 

400 

0.05 

- 

0, 

0, 

50 

0, 538.22 

0, 0, 

0, 

79.07 

400, 

400 

0.1 

- 

0, 

0, 

50 

0, 676.76 

0, 0, 

0, 

138.08 

400, 

400 

0.2 

- 

0, 

0, 

50 

0, 971.5 

0, 9, 

0, 

169.35 

400, 

450 

0.05 

- 

0, 

0, 

50 

0, 745.28 

0, 1, 

0, 

106.77 

400, 

450 

0.1 

- 

0, 

0, 

50 

0, 802.2 

0, 6, 

0, 

160.33 

400, 

450 

0.2 

- 

0, 

0, 

50 

0, 1214.22 

0, 18, 

0, 

203.77 

500, 

300 

0.05 

- 

0, 

0, 

50 

0, 233.82 

0, 0, 

0, 

22.42 

500, 

300 

0.1 

- 

0, 

0, 

50 

0, 362.28 

0, 0, 

0, 

49.36 

500, 

300 

0.2 

- 

0, 

0, 

50 

0, 439 

0, 0, 

0, 

66.16 

500, 

400 

0.05 

- 

0, 

0, 

50 

0, 568.55 

o 

o 

0, 

74.59 

500, 

400 

0.1 

- 

0, 

0, 

50 

0, 651.22 

0, 2, 

0, 

124.53 

500, 

400 

0.2 

- 

0, 

0, 

50 

0, 787.58 

0, 7, 

0, 

163.85 

500, 

500 

0.05 

- 

0, 

0, 

50 

0, 825.8 

0, 3, 

0, 

146.48 

500, 

500 

0.1 

- 

0, 

0, 

50 

0, 1234.92 

0, 23, 

0, 

189.37 

500, 

500 

0.2 

- 

0, 

0, 

50 

0, 1378.42 

0, 30, 

0, 

223.14 

1000, 

500 

0.05 

- 

0, 

0, 

25 

0, 609.24 

0, 3, 

0, 

150.16 

1000, 

500 

0.1 

- 

0, 

0, 

25 

0, 1199.32 

o' 

0, 

204.81 

1000, 

500 

0.2 

- 

0, 

0, 

25 

0, 938.84 

0, 12, 

0, 

217.7 


Table 2: Table for 200 through 1000 taxa representable by a persistent phylogeny. Galled- 
tree computations were not run on these data. 
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